A reservoir bubble point pressure prediction model using the Adaptive Neuro-Fuzzy Inference System (ANFIS) technique with trend analysis

The bubble point pressure (Pb) could be obtained from pressure-volume-temperature (PVT) measurements; nonetheless, these measurements have drawbacks such as time, cost, and difficulties associated with conducting experiments at high-pressure-high-temperature conditions. Therefore, numerous attempts have been made using several approaches (such as regressions and machine learning) to accurately develop models for predicting the Pb. However, some previous models did not study the trend analysis to prove the correct relationships between inputs and outputs to show the proper physical behavior. Thus, this study aims to build a robust and more accurate model to predict the Pb using the adaptive neuro-fuzzy inference system (ANFIS) and trend analysis approaches for the first time. More than 700 global datasets have been used to develop and validate the model to robustly and accurately predict the Pb. The proposed ANFIS model is compared with 21 existing models using statistical error analysis such as correlation coefficient (R), standard deviation (SD), average absolute percentage relative error (AAPRE), average percentage relative error (APRE), and root mean square error (RMSE). The ANFIS model shows the proper relationships between independent and dependent parameters that indicate the correct physical behavior. The ANFIS model outperformed all 21 models with the highest R of 0.994 and the lowest AAPRE, APRE, SD, and RMSE of 6.38%, -0.99%, 0.074 psi, and 9.73 psi, respectively, as the first rank model. The second rank model has the R, AAPRE, APRE, SD, and RMSE of 0.9724, 9%, -1.58%, 0.095 psi, and 13.04 psi, respectively. It is concluded that the proposed ANFIS model is validated to follow the correct physical behavior with higher accuracy than all studied models.


Introduction
the P b based on Rs, γ g , API, and Tf. Yang et al. [31] represented a correlation that can be used to predict the P b using some artificial intelligent algorithms, namely neural networks. Alakbari et al. [32] created their model based on the Rs, γ g , API, and Tf as inputs and more than 700 datasets, and they showed that their model has the absolute average percent relative error and the (R) were 8.422% and 0.990. Nonetheless, the previous models are required to improve their accuracy in obtaining the P b .
Numerous researchers successfully applied the adaptive neuro-fuzzy inference system (ANFIS) method in engineering calculations. A noise assessment of wind turbine was predicted using the ANFIS [33]. The ionic and electronic conductivity of materials was estimated utilizing the ANFIS [34]. Ayoub et al. [35] developed a model to obtain the drilling rate of penetration using the ANFIS technique. The wind power density was determined by applying the ANFIS [36]. Sambo et al. [37] used ANFIS to determine water saturation from seismic attributes. Hamdi and Chenxi [38] proposed an ANFIS model to predict CO 2 minimum miscibility pressure (MMP) with higher accuracy. A recent study has applied ANFIS to model the isothermal oil compressibility below the P b Ayoub et al. [39].
This research aims to build a robust and higher accurate model that can be used to determine the P b using the ANFIS method with the trend analysis (TrA). The only attempt to apply ANFIS for developing P b correlations is the one proposed by Shojaei et al. [40], who used 750 data points to build the P b model. However, they have not studied the TrA to prove the proper physical behavior for their model. Therefore, in this study, a robust and highly accurate ANFIS model was developed to predict the P b through TrA. More than 700 global datasets and the ANFIS method were applied with the trend analysis that is used to find the relationships between the independent variables (Rs, γ g , API, and T f ) and dependent variable (P b ) to indicate the correct physical behavior to build our ANFIS model with the trends analysis that is used for the first time to a robustly and accurately determine the P b . Moreover, statistical error analyses such as R were utilized to compare the ANFIS and all existing models' accuracy.

Data collection and pre-processing
More than seven hundred data sets were gathered from existing sources [11,24,28] to build the proposed ANFIS model. The Rs, γ g , API, and Tf are utilized as independent parameters in this study because most of the studies in the literature consider these parameters as inputs; however, Hanafy et al. [21] used only the Rs as the input to predict the P b , Table 1. Furthermore, the (R) for independent parameters (Rs, γ g , API, and T f ) to the dependent parameter (P b ) was found to evaluate the importance of the independent and dependent parameters as shown in Fig 2. From this figure, we can see the (R) of 0.876 for the Rs, and the P b means that the P b can be a strong function of the Rs. As displayed in Fig 2, the (R) of -0.513 for the γ g and

PLOS ONE
the P b indicates that the P b can be a moderate function of the γ g and the (R) of 0.383 and 0.315 for the API and T f proves that the P b can be a weak function of the API and T f . Before the ANFIS model was applied, the collected data were split into two parts 70% for training the model and 30% for testing the proposed ANFIS model. The statistical description of the training and testing datasets is shown in Table 2. As in the table, the training and testing datasets are at the same ranges to build and evaluate the ANFIS model with the same data ranges. It is essential to avoid the over-fitting and under-fitting issues; data randomization was used to overcome these issues. In addition, all parameters for the training and testing datasets were normalized between -1 and 1 to scale them in this range based on the following equation: Where: Y: the normalized parameter. Y max : the maximum normalized value (1). Y min : the minimum normalized value (-1). X: the input variable. X min : the minimum of the variable. X max : the maximum of the variable.

Proposed ANFIS model strategy
ANFIS is a combination of artificial neural networks (ANN) and fuzzy logic (FL), and it is one of the neural networks that use the Takagi-Sugeno fuzzy inference system. The Takagi-Sugeno fuzzy model applies two fuzzy rules [41]: (2) is used. (3) is applied. where: x 1 and x 2 : inputs. A 1 , A 2 , B 1 , and B 2 : membership values. p 1 , q 1 , r 1 , p 2 , q 2 , and r 2 : parameters of the output functions f 1 and f 2 , respectively. As displayed in Fig 3, the ANFIS structure is constructed of five layers. These layers are the fuzzification layer, rule layer, normalization layer, defuzzification layer, and output layer. ANFIS is a multilayer feedforward neural network with supervised learning capability (a hybrid learning rule) [42,43]. For the Sugeno fuzzy reasoning, the default defuzzification technique was applied. It can be a weighted average of all rule outputs. The fuzzified input values can be an algebraic sum of consequent fuzzy sets for the used aggregate technique. Firstly, input characteristics transfer to input membership functions. Then, they move to rules. After that, they shift to a set of output characteristics. Next, they go to output membership functions. Finally, the output membership functions provide output [44].
The ANFIS technique has advantages of showing better results than other methods. The ANFIS shows a better learning ability. It can perform a highly non-linear mapping. It has fewer adjustable parameters than those needed in other machine learning. Its structure can allow for parallel computation. Its networks show a well-structured knowledge representation and can also allow better integration with other control design methods [45]. ANFIS can combine ANN and Fl in a single tool to make the technique superb in reaching a quicker decision about the mapped relationship between the feature and target parameters [46]. The ANFIS has the benefit of decreased training time not only because of its smaller dimensions but also because the network is initialized with parameters in relation to the problem domain [47].
The proposed ANFIS model in this work was built using MATLAB R2019b. Fig 4 demonstrates the ANFIS output generated from MATLAB 2019b. The type of membership function applied in this proposed ANFIS model is Gaussian curve membership. The optimal hyperparameters of ANFIS were selected by using the manual method. In the manual method, each parameter changed in its different types or values. Then, the model accuracy and the correct trend analysis were checked. Finally, the optimal hyperparameters were selected with the proper trend analysis for the highest accuracy, as shown in Table 3.

Results and discussion
The ANFIS model was evaluated by conducting two tests. The proposed ANFIS model was first investigated by conducting TrA to ensure that all inputs follow the proper physical behavior. After that, the ANFIS model and studied correlations were compared. Statistical error analysis, namely, (R), standard deviation (SD), average percent relative error (APRE), average absolute percentage relative error (AAPRE), and root mean square error (RMSE), were performed to show the performance of the ANFIS and studied models.

Trend analysis (TrA)
The trend analysis (TrA) can be used to study the reliability of models. TrA can be applied by changing the studied input between the minimum and maximum values while keeping the Table 3. Descriptions of the optimal ANFIS model hyperparameters.

Parameter
Description/value Fuzzy structure Sugeno-type

Output membership function Linear
Cluster centre's range of influence 0.459

Number of inputs 4
Number of outputs 1

PLOS ONE
other inputs at their constant mean values. The studied input, such as Rs, is plotted as the xaxis and the output P b as the y-axis [27,[48][49][50]. The TrA is an essential part of this work, as some researchers used ANFIS, but they have not applied the trend analysis [40]. Without considering the trend analysis, it was clear that the ANFIS model may show fake high accuracy. As a result, the models developed without considering the trend analysis should not be considered as a reliable tool. The trend analysis was conducted for the ANFIS, and 21 studied models to study the relationships between the inputs (Rs, γ g , API, T f ) and output P b to show the physical behavior.
In the TrA study, the four independent variables (Rs, γ g , API, T f ) were selected because most previous models used these variables; nevertheless, the oil formation volume factor was not considered in our model because it is only utilized by [13,16,24]. The TrA was performed to represent the proper relationships between the Rs, γ g , API, T f and the P b to show the actual physical behavior for the studied parameters and validated the ANFIS model. Fig 5 presents the Rs TrA for the ANFIS and all existing models. As shown in Fig 5, the ANFIS and all the previous models show the proper relationships between the Rs and the P b . Increasing the Rs increases the P b . However, Farshad's [23] and Almehaideb's [13] correlations indicate that the P b was -812.6 and -207.5 psi at Rs 26 SCF/STB (as shown in Fig 5) because they built their correlation based on Rs ranges from 217 to 1406 and from 128 to 3871 SCF/ STB, respectively. Fig 6 indicates that the developed ANFIS model follows the proper relationships between the Rs and the P b to correct physical behavior. Li et al. [51] showed that increasing the Rs increased the P b .
The TrA of γ g for the ANFIS and all current models is demonstrated in Fig 7. The ANFIS and most existing models revealed that the γ g is inversely proportional to the P b , which proves the proper relationships between the γ g and the P b ; nevertheless, Hanafy et al.'s [21] correlation displayed that changing the γ g does not change the P b as indicated by the constant trend. This indicates an incorrect relationship between the γ g and the P b because γ g was not considered as

PLOS ONE
input in their model. Goma's [19] correlation showed that the P b was slightly increased by increasing the γ g and the correlation indicate improper TrA for γ g . Omar and Todd's [24] correlation represented that the P b decreases and then increases by increasing the γ g , which is also improper relationships between the γ g and the P b . Therefore, Omar and Todd's [24], Hanafy et al.'s [21], and Goma's [19] models represent incorrect relationships between the γ g and the P b , and hence, improper physical behavior for γ g trend. Fig 8 illustrated the correct trend γ g for the ANFIS model. Al-Shammasi [27] proved that growing the γ g declines the P b . Fig 9 shows the TrA of API for the ANFIS and all current models. The ANFIS and most models display the proper relationships between the API and the P b . The higher the API, the lower the P b is (Fig 9); however, Dokla and Osman [12], Hanafy et al. [21], and Gomaa [19] models do not show the correct relationships between the API and the P b, indicating incorrect physical behavior. Dokla and Osman's [12] correlation showed that the P b was slightly decreased by rising the API, (Fig 9). Gomaa's [19] correlation demonstrated that increasing the API also drops the P b slightly (Fig 9). Hanafy et al.'s [21] equation displayed that the P b is constant with changing the API (Fig 9). Petrosky and Farshad's [8] correlation shows that the P b is -37.37 psi and -145.91 psi at 48.11 and 51.7˚API, Fig 9 because they developed the equation in  range. The ANFIS model presents the correct relationships between the API and the P b , indicating proper physical behavior, as shown in Fig 10. Al-Shammasi [27] also revealed that increasing the API drops the P b .
The TrA of the T f for the ANFIS and all current models is illustrated in Fig 11. As shown in Fig 11, the ANFIS and most current models follow the proper relationships between the T f and the P b , increasing the T f increases the P b ; nonetheless, Dokla and Osman's [12] equation indicates that the P b declines by increasing T f indicating incorrect relationships between the T f and the P b . Hanafy et al.'s [21] correlation also displays a constant P b with increasing the T f to indicate the improper relationships between the T f and the P b . Dindoruk and Christman's [10] and Arabloo et al.'s [28] correlations represent that the P b is slightly changed by growing the T f to show incorrect physical behavior for the T f trend. The correct T f trend for the proposed

PLOS ONE
ANFIS model is clearly represented in Fig 12. The temperature can drop the gas density; therefore, the temperature can increase the P b .
From the TrA study, we can conclude that all independent parameters (Rs, γ g , API, T f ) of the ANFIS model represent the proper relationships with the P b to indicate the correct physical behavior; however, Dokla and Osman's [12], Omar and Todd's [24], Hanafy et al.'s [21], and

PLOS ONE
Goma's [19] correlation show the improper relationships between the independent parameters and the P b to indicate the incorrect physical behavior. Petrosky and Farshad's [8] and Almehaideb's [13] correlations display some negative P b because the Rs and API as inputs for these negative values do not include in their study ranges.  Fig 13 shows the cross-plot for the training datasets of the ANFIS model. Most training data are closer to the 45˚line to indicate that the ANFIS is a higher accurate model for the training datasets. The (R 2 ) for the training datasets of the ANFIS model is

PLOS ONE
ANFIS and all current models studied in this paper. As shown in Fig 15, the ANFIS model is the highest accurate model with (R 2 ) of 0.9878 compared to all studied models.

Statistical error analysis.
Some statistical analysis has been used along with trend analysis and cross-plotting analysis to validate and describe the efficiencies of the proposed ANFIS model. In addition, the ANFIS was compared against the 22 studied models that follow the correct physical behavior. The statistical error analysis applying in this study are (R), RMSE, SD, APRE, AAPRE, maximum and minimum absolute percent relative error (E max. ) and (E min. ). The statistical criterion explanations are presented in the appendix (S1 Appendix). The AAPRE and R were used in this research as the leading indicators to compare the ANFIS model's accuracy with the current models.
The ANFIS and existing models were compared by plotting the AAPRE and R (Fig 16). As display in Fig 16,

Conclusions
With 760 global datasets used, the ANFIS model was developed with the trend analysis to robustly and accurately predict the P b . In addition, the ANFIS mode's accuracy was compared with 21 existing models utilizing statistical error analysis. In this research, we can conclude the following: • The trend analysis results of the ANFIS model indicate that the ANFIS model can describe the correct relationships between the independent parameters (Rs, γ g , API, T f ) and dependent parameter P b to show the proper physical behavior.
• Some previous correlations fail to represent the proper relationships between the independent parameters and the P b to indicate incorrect physical behavior.
• The proposed ANFIS model outperformed all 21 existing models and has the lowest AAPRE of 6.38%, APRE of -0.99, RMSE of 9.73, SD of 0.074, E min. of 0.021%, and E max . of 50.19% and the highest R of 0.9939 compared to 21 studied correlations that follow the correct physical behavior. The ANFIS model shows better results than other models because of its

PLOS ONE
combination of the FL and ANN performances and better learning ability. The ANFIS can perform a highly non-linear mapping.
• The data randomization was conducted to prevent the model from overfitting or underfitting to obtain the robust and accurate ANFIS model to predict the P b .

Acknowledgments
Special thanks to the Centre of Research in Enhanced Oil Recovery (COREOR), Petroleum Engineering department, Universiti Teknologi PETRONAS for supporting this work.